• This tutorial provides an introduction to associationsubgraphs package, and to conducting the complete analysis including all the steps from the structure of the input data to the final visualization using an example data set.

Install package and load libraries

devtools::install_github("tbilab/associationsubgraphs")

library(tidyverse)
library(associationsubgraphs)

Input data

  • We’ll use Phecode pairs data available in the associationsubgraphs package as an example. The format of the input data set is similar to this data set, which is a dataframe including columns a and b representing the variables (nodes), and column strength that is a numeric indicator of strength of association (higher = stronger).

  • Strength represents how strongly two variables are associated with each other. For instance, in this example, node pairs refer to Phecode Pairs where strength of the association can be measured by the odds ratio from a 2 by 2 contingency table. And please remove node pairs with NA missing values of strength in the data set.

  • associationsubgraphs could handle large-scale input data with dimension such as the example Phecode pairs data.

#load example data set
data("phecode_pairs") 

phecode_pairs = phecode_pairs %>% 
  arrange(desc(strength)) %>% #sort the strength in descending order
  filter(!is.na(strength)) #filter out node pairs with missing values of strength

#dimension of the input data
dim(phecode_pairs) 
## [1] 1462623       3
#overview of the phecode pairs data
head(phecode_pairs) %>% 
  knitr::kable()
a b strength
296.00 300.00 131.0360
381.20 389.00 129.1788
173.00 216.00 127.6532
173.00 702.00 125.0173
636.00 655.00 122.8753
381.00 389.00 122.6985

Annotation data

  • Preparing a dataframe that has a column id that corresponds to the variables coded in a and b of Phecode pairs data that contains additional info of the Phecodes (nodes). For example, color and Phecode category were added to each Phecode. And the added information will be shown in the description table after clicking a subgraph to see details.
#prepare the annotation data
annotate_node <- c(phecode_pairs$a,phecode_pairs$b) %>%
  unique() %>%
  as_tibble() %>%
  rename(id = value) %>% #rename the column corresponds to the variables to "id"
  left_join(.,phecode_def %>% dplyr::select(phecode,description,group,color) %>% dplyr::rename(id=phecode),by="id") %>% # add additional info
  arrange(group)

#overview of the annotation data
head(annotate_node) %>%
  knitr::kable()
id description group color
401.22 Hypertensive chronic kidney disease circulatory system #D14285
394.00 Rheumatic disease of the heart valves circulatory system #D14285
411.40 Coronary atherosclerosis circulatory system #D14285
425.00 Cardiomyopathy circulatory system #D14285
426.91 Cardiac pacemaker in situ circulatory system #D14285
425.10 Primary/intrinsic cardiomyopathies circulatory system #D14285

Interactive subgraph visualization

Calculating the subgraph structure for downstream visualization

  • We use calculate_subgraph_structure() to calculate subgraph structure for downstream visualization. The subgraph structure is the set of subgraphs that constructed at all strength values, and the associations were sorted in descending order of strength.
#calculate subgraph structure
subgraphs <- phecode_pairs %>% 
  calculate_subgraph_structure()

#overview of the subgraph data
subgraphs %>% 
  dplyr::select(-subgraphs) %>%
  head() %>% 
  knitr::kable()
step n_edges strength n_nodes_seen n_subgraphs max_size rel_max_size avg_size avg_density n_triples
1 1 131.0360 2 1 2 1.0000000 2.000000 1.0000000 0
2 2 129.1788 4 2 2 0.5000000 2.000000 1.0000000 0
3 3 127.6532 6 3 2 0.3333333 2.000000 1.0000000 0
4 4 125.0173 7 3 3 0.4285714 2.333333 0.8888889 1
5 5 122.8753 9 4 3 0.3333333 2.250000 0.9166667 1
6 6 122.6985 10 4 3 0.3000000 2.500000 0.8333333 2

Prepare data for downstream visualization

  • In order to present a more readable visualization, we convert Phecode to Phecode description and convert the id column in annotation data to Phecode description as well. When clicking the subgraph, the annotation table will show.
#convert Phecode to Phecode description
phecode_pairs = phecode_pairs %>%
  rename(phecode=a) %>%
  left_join(.,phecode_def[,c("phecode","description")],by="phecode") %>%
  rename(a=description) %>%
  dplyr::select(-phecode) %>%
  rename(phecode=b) %>%
  left_join(.,phecode_def[,c("phecode","description")],by="phecode") %>%
  rename(b=description) %>%
  dplyr::select(-phecode)

#overview of the updated phecode pairs data
phecode_pairs %>% 
  head() %>% 
  knitr::kable()
strength a b
131.0360 Mood disorders Anxiety disorders
129.1788 Eustachian tube disorders Hearing loss
127.6532 Neoplasm of uncertain behavior of skin Benign neoplasm of skin
125.0173 Neoplasm of uncertain behavior of skin Degenerative skin conditions and other dermatoses
122.8753 Early or threatened labor; hemorrhage in early pregnancy Known or suspected fetal abnormality affecting management of mother
122.6985 Otitis media and Eustachian tube disorders Hearing loss
#update annotation data as well
annotate_node = annotate_node %>%
  dplyr::select(-id) %>%
  rename(id=description)

Final visualization

#visualize
visualize_subgraph_structure(
  phecode_pairs,
  node_info = annotate_node,
  subgraph_results = subgraphs,
  trim_subgraph_results = TRUE
)

Highlighting a node of interest

  • If you have a particular interest in a node in your network you can “pin” that node in the visualization so the initial start point of the visualization is when that node is first added to the visible subgraphs. For instance, if you are interested in Calculus of kidney, simply supply the id of "Calculus of kidney" to the visualize_subgraph_structure() function and you will be automatically taken to where Calculus of kidney first gets grouped into a subgraph.
#visualize
visualize_subgraph_structure(
  phecode_pairs,
  node_info = annotate_node,
  subgraph_results = subgraphs,
  trim_subgraph_results = TRUE,
  pinned_node = "Calculus of kidney"
)

Generating publishable web content

  • When you want to render the visualization or publish this visualization, visualize_subgraph_structure creates an R htmlwidget to host the visualization using r2d3, which means you can directly include your codes into a .Rmd file and then generate publishable web content by html file.